Metropolis

explain algorithm with easy problem

Why Hamiltonian Monte Carlo

The Metropolis algorithm and improved versions like Gibbs perform generally well, but they have problems when parameters are correlated.

Take for example these data, with high colinearity, akin the 2-legs example:

set.seed(1)
Sigma = matrix(c(1,.95,.95,1), nrow = 2)
X = MASS::mvrnorm(100,mu = c(0,0), Sigma = Sigma)
Y = X %*% c(1,1) + rnorm(100)
XY = cbind(X1 = X[,1], X2 = X[,2], Y = Y)
colnames(XY)[3] = "Y"
par(mar=c(3,3,2,1), mgp=c(1.25,.5,0), tck=-.01)
pairs(XY)

If we estimate a model \(\small Y ~ Normal(\beta_1 X_1 \ + \beta_2 X_2, \sigma)\) the coefficients \(\small \beta_1\) and \(\small \beta_2\) are highly correlated.

Lets see what the Metropolis sampler does with this:

Metropolis sampler for hard problem

We can observe following things:

This is just one example where Metropolis or Gibbs samplers can have difficulties.

Fitting models with Stan and ulam

One sampling algorithm that does much better than Metropolis is Gibbs is Hamiltonian Monte Carlo. On an intuitive level, the key difference in favor of Hamiltonian Monte Carlo is that is generates proposals less randomly then either Metropolis or Gibbs. Less randomly hear means that the direction from the current sample to the proposal is not random, but that proposals are more likely in direction of the bulk of the posterior distribution.

To clarify the relationship of different terms:

Simulating good posteriors

High n_eff

Many chains: Why and how many?

Bad models make bad posteriors

(wild chain)